Automatisk splitting av sammensatte ord-et lingvistisk hjelpemiddel for tekstsøking (Automatic splitting of compound words-A linguistic aid for text search) [In Norwegian]

نویسندگان

  • Tove Fjeldvig
  • Anne Golden
چکیده

Sammensatte ord skaper problemer ved ulike former for automatisk analyse av vokabularet i en tekst, f.eks, ved frekvensstudier. Problemet består i at menings­ innholdet i et sammensatt ord i mange tilfeller også kan beskrives i et uttrykk med de tilsvarende usammen­ satte ordene. I tekstsøking kan f.eks, de sammensatte ordene føre til at man ikke finner de dokumentene man søker etter fordi det ikke er samsvar i ordbruken mellom søkeargumentet og dokumentene. Hvis man f.eks, bare søker på et sammensatt ord uten å dele det opp i de enkelte ledd, vil man ikke finne de tekstene hvor alle leddene i det sammensatte ordet er nevnt, men løsrevet fra hverandre. P å d e n n e b a k g r u n n e n b l e d e t u t v i k l e t e n m e t o d e f o r a u t o m a t i s k s p l i t t i n g a v s a m m e n s a t t e o r d . M e t o d e n e r b a s e r t p å e t s e t t med c a . 1 0 0 0 r e g l e r o g i k k e e t l e k s i k o n .

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Splitting of Compound Terms in non-Prototypical Compounding Languages

Compounding is present in a large variety of languages in different proportions. Compound rate in the text obviously depends on the language, but also on the genre and the domain. Scientific and technical texts are especially conducive to compounding, even in the languages that are not traditionally admitted as highly compounding ones. In this article we address compound splitting of specialize...

متن کامل

Text Segmentation into Paragraphs Based on Local Text Cohesion

The problem of automatic text segmentation is subcategorized into two different problems: thematic segmentation into rather large topically selfcontained sections and splitting into paragraphs, i.e., lexico-grammatical segmentation of lower level. In this paper we consider the latter problem. We propose a method of reasonably splitting text into paragraph based on a text cohesion measure. Speci...

متن کامل

Integrated JIT Lot-Splitting Model with Setup Time Reduction for Different Delivery Policy using PSO Algorithm

This article develops an integrated JIT lot-splitting model for a single supplier and a single buyer. In this model we consider reduction of setup time, and the optimal lot size are obtained due to reduced setup time in the context of joint optimization for both buyer and supplier, under deterministic condition with a single product. Two cases are discussed: Single Delivery (SD) case, and Multi...

متن کامل

Key Issues in Vowel Based Splitting of Telugu Bigrams

Splitting of compound Telugu words into its components or root words is one of the important, tedious and yet inaccurate tasks of Natural Language Processing (NLP). Except in few special cases, at least one vowel is necessarily involved in Telugu conjunctions. In the result, vowels are often repeated as they are or are converted into other vowels or consonants. This paper describes issues invol...

متن کامل

A Sandhi Splitter for Malayalam

Sandhi splitting is the primary task for computational processing of text in Sanskrit and Dravidian languages. In these languages, words can join together with morpho-phonemic changes at the point of joining. This phenomenon is known as Sandhi. Sandhi splitter splits the string of conjoined words into individual words. Accurate execution of sandhi splitting is crucial for text processing tasks ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1985